Skip to content

[SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior#17887

Closed
viirya wants to merge 11 commits intoapache:masterfrom
viirya:add-config-fallback-string-parsing
Closed

[SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior#17887
viirya wants to merge 11 commits intoapache:masterfrom
viirya:add-config-fallback-string-parsing

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented May 7, 2017

What changes were proposed in this pull request?

The new SQL parser is introduced into Spark 2.0. All string literals are unescaped in parser. Seems it bring an issue regarding the regex pattern string.

The following codes can reproduce it:

val data = Seq("\u0020\u0021\u0023", "abc")
val df = data.toDF()

// 1st usage: works in 1.6
// Let parser parse pattern string
val rlike1 = df.filter("value rlike '^\\x20[\\x20-\\x23]+$'")
// 2nd usage: works in 1.6, 2.x
// Call Column.rlike so the pattern string is a literal which doesn't go through parser
val rlike2 = df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$"))

// In 2.x, we need add backslashes to make regex pattern parsed correctly
val rlike3 = df.filter("value rlike '^\\\\x20[\\\\x20-\\\\x23]+$'")

Follow the discussion in #17736, this patch adds a config to fallback to 1.6 string literal parsing and mitigate migration issue.

How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants